AITopics | feature dimension

Collaborating Authors

feature dimension

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Neural Information Processing SystemsJun-22-2026, 18:23:03 GMT

Linear attention has attracted interest as a computationally efficient approximation to softmax attention, especially for long sequences. Recent studies have explored distilling softmax attention in pre-trained Transformers into linear attention. However, a critical challenge remains: how to choose the feature dimension that governs the approximation quality. Existing methods fix this dimension uniformly across all attention layers, overlooking the diverse roles and complexities of them. In this paper, we propose a principled method to automatically determine the feature dimension in linear attention using the concept of statistical degrees of freedom, which represent the effective dimensionality of the inputs. We provide a theoretical bound on the approximation error and show that the dimension chosen by our method achieves smaller errors under a fixed computational budget. Furthermore, we introduce an efficient layerwise training strategy to learn nonlinear features tailored to each layer. Experiments on multiple pre-trained transformers demonstrate that our method improves the performance of distilled models compared to baselines without increasing the inference cost. Our findings also provide insight into how the complexity of the attention mechanism evolves across layers.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SAS: Simulated Attention Score

Neural Information Processing SystemsJun-16-2026, 03:16:14 GMT

The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multihead attention (MHA), multi-query attention, group-query attention and so on. We further analyze the MHA and observe that its performance improves as the number of attention heads increases, provided the hidden size per head remains sufficiently large. Therefore, increasing both the head count and hidden size per head with minimal parameter overhead can lead to significant performance gains at a low cost. Motivated by this insight, we introduce Simulated Attention Score (SAS), which maintains a compact model size while simulating a larger number of attention heads and hidden feature dimension per head. This is achieved by projecting a low-dimensional head representation into a higher-dimensional space, effectively increasing attention capacity without increasing parameter count. Beyond the head representations, we further extend the simulation approach to feature dimension of the key and query embeddings, enhancing expressiveness by mimicking the behavior of a larger model while preserving the original model size. To control the parameter cost, we also propose Parameter-Efficient Attention Aggregation (PEAA). Comprehensive experiments on a variety of datasets and tasks demonstrate the effectiveness of the proposed SAS method, achieving significant improvements over different attention variants.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

SAS: Simulated Attention Score

Neural Information Processing SystemsJun-11-2026, 12:07:10 GMT

The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multi-head attention (MHA), multi-query attention, group-query attention and so on. We further analyze the MHA and observe that its performance improves as the number of attention heads increases, provided the hidden size per head remains sufficiently large. Therefore, increasing both the head count and hidden size per head with minimal parameter overhead can lead to significant performance gains at a low cost. Motivated by this insight, we introduce Simulated Attention Score (SAS), which maintains a compact model size while simulating a larger number of attention heads and hidden feature dimension per head. This is achieved by projecting a low-dimensional head representation into a higher-dimensional space, effectively increasing attention capacity without increasing parameter count. Beyond the head representations, we further extend the simulation approach to feature dimension of the key and query embeddings, enhancing expressiveness by mimicking the behavior of a larger model while preserving the original model size. To control the parameter cost, we also propose Parameter-Efficient Attention Aggregation (PEAA). Comprehensive experiments on a variety of datasets and tasks demonstrate the effectiveness of the proposed SAS method, achieving significant improvements over different attention variants.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Training step L0L1LT 1W Preprocessing f(x, v) T

Neural Information Processing SystemsMay-1-2026, 02:04:01 GMT

In the following sections, we provide additional details about the network architecture, training, and experiments. The source code and WBC-SPH data set are published at https://github.com/ A.1 Implementation Details We implement our neural network with Tensorflow (https://www.tensorflow.org), They also serve as the basis for the implementation of our antisymmetric CConv (ASCC) layer. Axis for Mirroring As mentioned in the main text, the mirror axis for ASCC layers can be chosen freely while fulfilling the requirements from theory. This provides a degree of freedom for implementation. We decided to use a fixed axis, which in our case corresponds to the spatial y-axis. While the mirroring could potentially be coupled to the spatial content of features, we found that a single, fixed axis for mirroring simplifies the implementation of the ASCCs, and hence is preferable in practice. Additional Modifications In addition to the properties of our algorithm as discussed in Section 2.3 and the ablation study in Section 3, we normalize the input data depending on the given gravitational direction in the model.

artificial intelligence, machine learning, particle, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

A Proofs

Neural Information Processing SystemsFeb-16-2026, 14:41:03 GMT

Section A.1 presents the lemmas used to prove the main results. Section A.2 presents the main results The first two inequalities are owing to the triangle inequality, and the third inequality is due to the definition of L-divergence Eq.(5). We complete the proof by applying Lemma A.1 to bound F ollowing the conditions of Theorem 4.1, the upper bound of null V arnull null D Based on the conditions of Theorem 4.1, we assume We complete the proof by applying Lemma A.3 and Lemma A.4 to bound the Rademacher Following the proof of Theorem 4.1, we have |D F ollowing the conditions of Proposition 4.3, as N, we have, null D Based on the result on Proposition 4.3, for any δ (0, 1), we know that 4LB ( 2 D ln 2 + 1)null We complete the proof by applying the triangle inequality. III: Samples from p and q are labeled with 0 and 1, respectively. All values are averaged over five trials.

artificial intelligence, inequality, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Learning Deep Bilinear Transformation for Fine-grained Image Representation

Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

Neural Information Processing SystemsFeb-13-2026, 00:53:39 GMT

Bilinear feature transformation has shown the state-of-the-art performance in learning fine-grained image representations.

antic group, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Supplementary Materials for " DropCov: A Simple yet Effective Method for Improving Deep Architectures " Qilong Wang

Neural Information Processing SystemsFeb-12-2026, 05:53:15 GMT

Our proposed DropCov can be flexibly integrated with existing deep architectures (e.g., CNNs [ Qinghua Hu is the corresponding author and is with Engineering Research Center of City intelligence and Digital Governance, Ministry of Education of the People's Republic of China. VGG-VD on three small-scale fine-grained datasets) show 0.5 is the best choices of As listed in Table S2, we can see that single L T module brings a little gain for plain GCP . Compared to B-CNN + L T (79.62% training accuracy), plain GCP GCP + L T, while B-CNN + L T achieves significant improvement over B-CNN and plain GCP . On the contrary, the samples involving less redundant information (e.g., scene) have large Such these phenomena show the consistency with our finding. Is second-order information helpful for large-scale visual recognition?

artificial intelligence, dropcov model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Government > Regional Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

c3ba4962c05c49636d4c6206a97e9c8a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 19:16:29 GMT

arxiv preprint arxiv, quantization, transformer, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

A Appendix for OPERA Contents

Neural Information Processing SystemsFeb-10-2026, 14:49:44 GMT

The five datasets used cover a wide range of respiratory medical conditions.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.05)
Europe > Portugal > Aveiro > Aveiro (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Algorithm 1: Pseudocode of PIC in a PyTorch-likestyle

Neural Information Processing SystemsFeb-9-2026, 22:01:52 GMT

LinearEvaluationProtocol Inlinear evaluation, wefollowthecommon setting [6,5]tofreeze the backbone of ResNet-50 and train a supervised linear classifier on the global average pooling features for100 epochs. Note that, the2-layer head inunsupervised pre-training isnotused inthe linear evaluation stage. During training, we augment the image with random scaling from 0.5 to 2.0, crop size of 769 and random flip. The top-1 and top-5 accuracyresults are reported inTable9. From the perspective of optimization goals, the only difference between the parametric instance classification framework and supervised classification framework is how to define the classes for each instance.

artificial intelligence, machine learning, pytorch-likestyle, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback